Environment modulates protein heterogeneity through transcriptional and translational stop codon readthrough

Stop codon readthrough events give rise to longer proteins, which may alter the protein’s function, thereby generating short-lasting phenotypic variability from a single gene. In order to systematically assess the frequency and origin of stop codon readthrough events, we designed a library of reporters. We introduced premature stop codons into mScarlet, which enabled high-throughput quantification of protein synthesis termination errors in E. coli using fluorescent microscopy. We found that under stress conditions, stop codon readthrough may occur at rates as high as 80%, depending on the nucleotide context, suggesting that evolution frequently samples stop codon readthrough events. The analysis of selected reporters by mass spectrometry and RNA-seq showed that not only translation but also transcription errors contribute to stop codon readthrough. The RNA polymerase was more likely to misincorporate a nucleotide at premature stop codons. Proteome-wide detection of stop codon readthrough by mass spectrometry revealed that temperature regulated the expression of cryptic sequences generated by stop codon readthrough in E. coli. Overall, our findings suggest that the environment affects the accuracy of protein production, which increases protein heterogeneity when the organisms need to adapt to new conditions.

with a median fluorescence higher than the threshold defined as the median fluorescence plus two standard deviations of the NC.B) Box plots summarising fluorescence distributions of E. coli cells expressing nine selected reporters grown under various conditions, highlighting: i) increased errors in minimum media (M9) compared to rich media (LB), ii) TAA as the most accurate stop codon and TGA as the least accurate, and iii) higher SCR events at low temperatures and in minimum media.Y-axis represents median fluorescence relative to the positive control (PC).The box extends from the 25th to 75th percentile of the data and the horizontal line inside the box represents the 50th percentile of the data.The whiskers extend from the minimum to the maximum value within 1.5 times the interquartile range (IQR) from the lower and upper quartiles, respectively.C) mScarlet thermostability assay.mScarlet remained functional, i.e., fluorescent, until 70°C (mean and standard deviation of three replicates are shown).Source data is provided as Source data file.Supplementary Figure 4. Wilcoxon one-sided test shows statistical evidence of a temperaturedriven effect on SCR: 18°C>25°C>30°C~37°C>42°C when a stop codon is inserted at positions 105 and 155.Cell counts were maximized across replicas, excluding those with <200 cells.The test evaluated if one cell distribution significantly exceeded another.Temperature decrease correlates with increased SCR rates, especially notable at higher SCR rates (I.e., at 18°C and with TGA stop codon).Source data is provided as Source data file.Supplementary Figure 10.Stop codon readthrough events occur evenly along the mScarlet sequence.The SCR score describes an SCR event's likelihood at a given position (see Methods section for further details about the calculation of SCR score).There is no significant correlation between SCR score and mScarlet amino acid position.We excluded four positions with a discordance between Histag and fluorescence signal over 30% (dark reporters).Source data is provided as Source data file.

Supplementary
Supplementary Figure 11.The impact of predicted mRNA secondary structure on SCR.A) The predicted minimum free energy (MFE) is shown for the most stable secondary structure elements in a 100-nt window upstream of the premature stop codon at different temperatures.Decreasing the temperature stabilized the predicted RNA secondary structures.There is no correlation between the stability of the predicted RNA secondary structures and the likelihood of SCR (SCR score; see Methods section for further details about the calculation of SCR score).B) The predicted minimum free energy (MFE) is shown for the most stable secondary structure elements in a 100-nt window downstream of the premature stop codon at different temperatures.Decreasing the temperature stabilized the predicted RNA secondary structures.However, there is no correlation between the stability of the predicted RNA secondary structures and the likelihood of SCR.C) There are no significant differences in the stability (MFE) of the predicted RNA secondary structures of the reporters with no SCR errors, mid-tendency to SCR errors, and high-tendency to SCR errors (see Methods section for further details about the binning into three categories of the SCR score).D) The local thermodynamic stability of the secondary structures (i.e., the number of all base pairs, G-C pairs, and the longest stretch of consecutive pairs) in a set of 5, 10, 20, and 50-nt windows upstream of the premature stop codon does not correlate with the likelihood of SCR events.E) The local thermodynamic stability of the secondary structures (i.e., the number of all base pairs, G-C pairs, and the longest stretch of consecutive pairs) in a set of 5, 10, 20, and 50-nt windows downstream of the premature stop codon does not correlate with the likelihood of SCR events.Source data is provided as Source data file.).Note that DIA-NN does not take into account the ions b1, b2, y1, and y2, even when predicted to be intense, due to a high likelihood of interference.For this reason, the dot product calculated by Skyline is not applicable and not shown.The right panels show chromatograms (with Savitsky-Golay smoothing) of fragments in the output spectral library from a single replicate, with peak boundaries as reported in the DIA-NN main output table, and mass error (in ppm) for the displayed replicate as calculated by Skyline.
Supplementary Figure 16.Stop codon readthrough events may be more likely among genes within multi-operons.Using the RegulonDB database 2 , we classified the E. coli genes into three categories: 1) Single gene operons, those in operons that contain only one gene.2) Final multi-gene operons, those expressed last within the multi-gene operon.3) Within multi-gene operons, those not expressed last within the multi-gene operon.A) Genes within multi-operon are enriched in TGA, the most error-prone codon, while depleted in TAA, the most accurate, compared to genes that belong to single-gene and final multi-gene operons.B) Genes within multi-operon are enriched in G and depleted in T at the first nucleotide position downstream of the TGA stop codon.We found that the presence of T increases and G decreases the protein synthesis termination efficiency (Fig 3B and 3C).Source data is provided as Source data file.
Supplementary Figure 18.Agreement between indexed retention times predicted by Prosit 3 and indexed RT calculated from measured RT.For all precursors reported by DIA-NN, indexed retention times were predicted with Prosit.The plot shows the iRT predicted by Prosit vs. the iRT calculated by DIA-NN, as given in its main output table.Each dot represents a precursor, SCR precursors are shown in blue.The outlier SCR precursor in the bottom right part, eluting much earlier than predicted, was removed from the analysis.Source data is provided as Source data file.Supplementary Table 2. Wilcoxon one-sided testing, adjusted using the Bonferroni method, shows statistical evidence of a temperature-driven effect on SCR, analysing the highest cell counts across replicas and excluding those with fewer than 200 cells.The test assessed whether one cell distribution was significantly greater than another.Temperature decrease leads to higher SCR rates in a non-linear fashion, especially notable at 18°C.This effect is more pronounced with higher SCR rates and may be less apparent at low SCR rates.Source data is provided as Source data file.4. Mass spectrometry analysis of the reporters revealed that stop codon readthrough primarily related to amino acid misincorporations.TGA was almost always replaced by tryptophan (W) and, in lower frequency, by cysteine (C), glutamic acid (E), aspartic acid (D), and methionine (M).TAG was mainly replaced by glutamine (Q) and tyrosine (Y).Misincorporation of alanine (A), serine (S), tryptophan (W) and lysine (K) was a minor process.TAA was mainly replaced by glutamine (Q), tyrosine (Y) and alanine (A).Misincorporation of serine (S) and lysine (K) at the TAA position was a minor process.Major incorporations (relative abundance >10%) are shown in bold.

Supplementary
1 -Only peptides covering mutated position are shown; X or K (for Lys) designate misincorporated amino acid 2 -Calculated as described in Materials and Methods for the corresponding forms of the peptide comprising mutated position; K was not included.
Supplementary Table 5. Mass spectrometric analysis of E.coli proteome identified peptides from non-coding regions resulted from stop codon readthrough events.Peptides indicating SCR events are listed with modification and charge state as reported by DIA-NN.TGA was more error-prone stop codon than TAA.We did not detect any evidences of SCR in TAG proteins, probable due to is low representation in the E.coli genome (8%).We detected more cases of SCR in E.coli samples grown at l8°C than at 37°C (14 vs. 11, see also Fig 2A and C).

Figure 5 .
Wilcoxon one-sided test shows statistical evidence of a non-linear temperature-driven effect on SCR: 18°C>25°C~37°C>42°C.We assessed the median of fluorescence relative to the wild-type for all the reporters studied in LB and M9 media (dataset from Fig S2), excluding those with no SCR at any of the studied temperatures.Number of datapoint at each temperature = 228, mean at 18°C = 5.50, at 25°C = 0.89, at 37°C = 0.58 and at 42°C = 0.38.Source data is provided as Source data file.Supplementary Figure 6.Visualization and quantification of stop codon readthrough events in E. coli grown in different media.Fluorescence distributions displayed by E.coli cells transformed with each library's reporters and grown at 37℃ in minimal media supplemented with A) high carbon source concentration (M9 media with 1.6% glycerol, 0.2% casamino acids, 1mM thiamine hydrochloride, 2 mM MgSO 4 and 0.1 mM CaCl 2) and B) high casamino acid concentration (M9 media with 0.4% glycerol, 0.4% casamino acids, 1mM thiamine hydrochloride, 2 mM MgSO 4 and 0.1 mM CaCl 2).Each distribution is derived from one replicate and 20 to 10967 cells.Source data is provided as Source data file.Supplementary Figure 7. Detection of stop codon readthrough by analysing the His-tag expression with His-tag antibodies in bands corresponding to the expected size for full-lengh mScarlet.In-gel fluorescence signals of the PC (cells expressing the mScarlet) are shown in red.The His-tag signal of the positive control (PC), negative control (NC, cells carrying an empty vector), and the cells expressing the reporters are shown in green.The numbers below the images indicate the quantification of the His-tag expression as explained in the Methods section.Cells were grown in rich media at 18℃ (see Methods).Full-sized gels are shown in Figure S8.
VYAGNEHNHAAQQPQVLDICSGL (3+, m/z = 841.06854)Supplementary Figure14.Evidence for peptides in TableS5showing stop codon readthrough in the E. coli proteome.The amino acid inserted at the Stop codon is shown in bold.If no amino acid is bold, the peptide is completely downstream of the Stop codon.Underlined residues are modified: carbamidomethylation (+57) at C, oxidation (+18) at M. The data shown represent the evidence used by DIA-NN to identify the peptides.For each peptide, the left panel shows a mirror plot of empirical relative fragment intensities extracted by DIA-NN as they appear in the output spectral library (upper spectrum), compared to Prosit prediction (lower spectrum; Prosit spectra predicted with NCE 31, taking into account the offset of 6 between Prosit and Orbitrap NCE 1

Table 1 . Friedman (when comparing 3 replicas) and sign (when comparing 2 replicas) two-sided testing confirm a high level of cell-to-cell variability, indicating significant differences among biological replicates.
Replicas with fewer than 200 cells were excluded from the analysis.Empty cells represent instances where only one replica displayed more than 200 cells and the test could not be performed.Source data is provided as Source data file.

Table 3 . Wilcoxon one-sided testing, adjusted using the Bonferroni method, shows statistical evidence of a non-linear temperature- driven effect on SCR: 18°C>25°C~37°C>42°C.
We assessed the median of fluorescence relative to the wild-type for all the reporters studied in LB and M9 media (dataset from Fig S2), excluding those with no SCR at any of the studied temperatures.Source data is provided as Source data file.